Improving Actor-Critic Reinforcement Learning via Hamiltonian Monte Carlo Method
نویسندگان
چکیده
The actor-critic RL is widely used in various robotic control tasks. By viewing the from perspective of variational inference (VI), policy network trained to obtain approximate posterior actions given optimality criteria. However, practice, may yield suboptimal estimates due amortization gap and insufficient exploration. In this work, inspired by previous use Hamiltonian Monte Carlo (HMC) VI, we propose integrate with HMC, which termed as Policy. As such evolve base according our proposed method has many benefits. First, HMC can improve distribution better hence reduce gap. Second, also guide exploration more regions action spaces higher Q values, enhancing efficiency. Further, instead directly applying into RL, a new leapfrog operator simulate dynamics. Finally, safe problems, find that not only achieved return, but safety constraint violations discarding potentially unsafe actions. With comprehensive empirical experiments on continuous baselines, including MuJoCo PyBullet Roboschool, show approach data-efficient easy-to-implement improvement over methods.
منابع مشابه
Dynamic Control with Actor-Critic Reinforcement Learning
4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...
متن کامل1 Supervised Actor - Critic Reinforcement Learning
Editor’s Summary: Chapter ?? introduced policy gradients as a way to improve on stochastic search of the policy space when learning. This chapter presents supervised actor-critic reinforcement learning as another method for improving the effectiveness of learning. With this approach, a supervisor adds structure to a learning problem and supervised learning makes that structure part of an actor-...
متن کاملPretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-c...
متن کاملIntensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms
Algorithms of reinforcement learning usually employ consecutive agent’s actions to construct gradients estimators to adjust agent’s policy. The policy is a result of some kind of stochastic approximation. Because of the slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control. In this paper we analyze the replacing of the...
متن کاملActor-Critic Reinforcement Learning with Energy-Based Policies
We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear var...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on artificial intelligence
سال: 2022
ISSN: ['2691-4581']
DOI: https://doi.org/10.1109/tai.2022.3215614